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Abstract 

Background: Dent disease 1 represents a hereditary disorder of renal tubular epithelial function associated with 
mutations in the CLCN5 gene that encoded the CIC-5 CIVH + antiporter. All of the reported disease-causing mutations 
are localized in the coding region except for one recently identified in the 5'UTR region of a single patient. This finding 
highlighted the possible role for genetic variability in this region in the pathogenesis of Dent disease 1. 
The structural complexity of the CLCN5 5'UTR region has not yet been fully characterized. To date 6 different 
5' alternatively used exons - 1a, 1b, 1b1 and l-IV with an alternatively spliced exon II (Ha, lib) - have been described, but 
their significance and differential expression in the human kidney have not been investigated. Therefore our aim was 
to better characterize the CLCN5 5'UTR region in the human kidney and other tissues. 

Methods: To clone more of the 5' end portion of the human CLCN5 cDNA, total human kidney RNA was utilized as 
template and RNA ligase-mediated rapid amplification of cDNA 5' ends was applied. 

The expression of the different CLCN5 isoforms was studied in the kidney, leucocytes and in different tissues by 
quantitative comparative RT/PCR and Real -Time RT/PCR. 

Results: Eleven transcripts initiating at 3 different nucleotide positions having 3 distinct promoters of varying strength 
were identified. Previously identified 5'UTR isoforms were confirmed, but their ends were extended. Six additional 
5'UTR ends characterized by the presence of new untranslated exons (c, V and VI) were also identified. Exon c 
originates exon c.1 by alternative splicing. The kidney uniquely expresses all isoforms, and the isoform containing exon 
c appears kidney specific. The most abundant isoforms contain exon 1a, exon Ha and exons 1b1 and c. ORF analysis 
predicts that all isoforms except 3 encode for the canonical 746 amino acid CIC-5 protein. 

Conclusions: Our results confirm the structural complexity of the CLCN5 5'UTR region. Characterization of this crucial 
region could allow a clear genetic classification of a greater number of Dent disease patients, but also provide the basis 
for highlighting some as yet unexplored functions of the CIC-5 proton exchanger. 
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Background 

Dent disease 1 (OMIM 300009) is an X-linked recessive 
disorder of renal tubular epithelial function, associated 
with genetic variation of the CLCN5 gene that encodes 
the CIC-5 Cl"/H + antiporter. The disease is character- 
ized by low-molecular-weight proteinuria, hypercalciu- 
ria, nephrocalcinosis, nephrolithiasis and one or several 
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features of proximal tubular dysfunction (glycosuria, ami- 
noaciduria, and phosphaturia, etc.) [1]. CIC-5 is expressed 
in the kidney, particularly in proximal tubular cells and in- 
tercalated collecting duct cells. The human CLCN5 gene, 
spanning about 170 kb of genomic DNA on chromosome 
XplL23/pll.22, consists of 17 exons including 11 coding 
exons (2-12) and 6 different 5' alternatively used exons 
(5'UTR), some remaining untranslated [1-4]. Transcripts 
including the untranslated exon la (NM_000084.4: mRNA 
variant 3) [2] or lb (NM_001282163.1: mRNA variant 4) 
[3] are spliced to exon 2 and contain the start sequence 
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ATG. A third mRNA comprises a larger exon lb and re- 
tains intron 1 (exon lbl) (alternative mRNA variant 4) [4]. 
Subsequently, two additional long transcripts due to alter- 
native splicing of exon II and including exons I to IV have 
also been identified (NM_001 1278993: mRNA variant 1 
and NM_001127898.3: mRNA variant 2) [5]. Both of these 
transcripts carry the ATG start sequence in exon III, 
thereby encoding a NH 2 -terminal extended C1C-5 isoform 
consisting of 816 amino acids instead of the canonical 
transcript of 746 amino acids. 

Although more than 150 different mutations have been 
reported within the CLCN5 coding exons, patients with 
typical symptoms of Dent disease 1 have been genotyped 
by our group and others in whom no mutations could be 
detected [6-9]. The presence of many different 5'UTR 
ends of CLCN5 mRNA in the kidney highlights the com- 
plexity of both the molecular structure and the regula- 
tory apparatus of the gene. Moreover, the 5'UTR might 
hide Dent disease 1 disease-causing mutations or poly- 
morphisms that may influence disease expression. In- 
deed, while analysing 30 CLCN5 negative patients our 
group identified a nucleotide substitution in the 5' un- 
translated exon lbl of one individual which appeared 
disease-causing since it was not detected in 471 X nor- 
mal chromosomes [10]. 

The functional significance of these regulatory regions 
has not been elucidated. Their differential expression in 
the kidney versus other human tissues like brain, skeletal 
muscle and the eye has also not been assessed, despite the 
potential involvement of these organs in Dent 1 cases [11]. 
Thus we decided to study this region in depth. Here we re- 
port the identification of additional 5'UTR ends of human 
CLCN5 cDNA within the kidney, including the presence of 
newly identified exons. In total eight exons are now known 
to be present in the 5'UTR region of the CLCN5 gene, 
giving rise to eleven isoforms. Moreover we succeeded in 
extending the 5' ends of the previously known CLCN5 
transcripts to identify new transcription start sites. These 
novel CLCNS mRNA species are demonstrated to be dif- 
ferently expressed in kidney and other human tissues. 

Methods 

RNA ligase-mediated rapid amplification of cDNA 
5' ends PCR 

The GeneRacer kit (Invitrogen) was used in accordance 
with the manufacturer s instructions to obtain clones with 
the 5 ' portion of the human CLCNS cDNA. In brief, 5 ug 
of total human kidney RNA (Stratagene) was treated with 
calf intestinal phosphatase to remove 5 ' phosphates. This 
has no effect on capped full-length mRNA but removes 
non-mRNA or truncated mRNA from the ligation re- 
action. The sample was then treated with tobacco acid 
pyrophosphatase to remove the 5' mRNA cap structure, 
which leaves a 5' phosphate required for ligation to 



the GeneRacer RNA Oligo (5'-CGACUGGAGCACG 
AGG AC ACUG AC AUG G ACUG AAGG AGUAG A AA-3 ' ) . 
The GeneRacer RNA Oligo was ligated to the 5' end of 
the decapped mRNA with T4 RNA ligase, which provides 
a known priming site for the GeneRacer PCR primers. A 
reverse transcription reaction was then performed using 
Superscript III Reverse Transcriptase and the GeneRacer 
Oligo dT Primer (5 -GCTGTCAACGATACGCTACG- 
TAACGGCATGACAGTG(T)24-3 ') provided in the kit. 
The CLCNS 5 ' cDNA ends were amplified with a touch- 
down PCR using a CLCNS gene specific antisense primer 
(rRACE Ex 2), the GeneRacer 5 ' primer and Platinum Taq 
DNA Polymerase High Fidelity (Invitrogen). Cycling con- 
ditions followed the manufacturers protocol. After the 
successful amplification was confirmed by agarose gel 
electrophoresis, a second round of nested PCR ampli- 
fication was done as above, except that the GeneRacer 5 ' 
nested primer and the CLCNS gene specific nested anti- 
sense primers were used in place of the GeneRacer 5 ' pri- 
mer and the CLCNS rRACE Ex 2 primer, respectively (see 
Additional file 1). The PCR products were cloned into 
plasmid vector pCR4-TOPO and transformed into com- 
petent One Shot TOP10 cells with a TOPO TA clon- 
ing kit (Invitrogen) following the suppliers protocol. 
Colonies were analysed by PCR, using vector specific 
primers (M13F: 5 -GTAAAACGACGGCCAG-3'; M13R: 
5 -CAGGAAACAGCTATGAC-3 '). In total, 172 clones 
were sequenced using vector specific primers. 

Sequencing 

The sequence analysis of cDNA clones and of RT/PCR 
products was performed using a direct Sanger sequencing 
method. The sequencing process included purification of 
PCR products using the MinElute PCR Purification Kit 
(Qiagen), sequencing via the Big Dye Terminators vl.l 
Cycle Sequencing Kit (Applied BioSystems), and final 
purification using Centrisep Columns (Princeton Sepa- 
ration), all in accordance with operational manuals. Se- 
quences were analyzed using an ABI-PRISM 3100 Genetic 
Analyzer (Applied BioSystems). To compare CLCNS 5' 
UTR cDNAs and coding region with its genomic se- 
quences, the NCBI Blast 2 sequence alignment program 
was used. The human CLCNS mRNA (GeneBank acces- 
sion number NM_001 127899.3; NM_001 127898.3; NM_ 
000084.4; NM_00 1282 163.1 and NM_001272102.1), the 
human CLCNS DNA sequence on chromosome X 
(GenBank accession number NG_007159.3) and the 
CLCNS promoter region (GeneBank accession number 
AB020597.1) were used for comparison. 

Quantitative comparative RT/PCR analysis of CLCN5 
mRNA in different human tissues 

All patients gave their informed, written consent. Total 
RNA from leucocytes of healthy human subjects was 
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extracted using the PAXgene blood RNA kit (Qiagen) 
according to the manufacturer s instructions. The collec- 
tion of the blood samples used in this study was made 
with the appropriate approval of the Ethics Committee 
of Azienda Ospedaliera of Padova. One \A of RNA was 
used for spectrophotometric quantification at 260 and 
280 nm using the NanoDrop ND-1000 spectrophoto- 
meter (NanoDrop Technologies), and RNA integrity 
was checked with the Agilent 2100 bioanalyzer (Agilent 
Technologies). Total RNA from human normal tissue 
samples (kidney, brain, lung, liver, colon, placenta, testes, 
skeletal muscle), and from endothelial cells were pur- 
chased from Stratagene. 

To analyze the differential expression of CLCN5 gene 
5'UTR ends, a different set of primers were designed 
(see Additional file 2). The primers Ex I/II F and Ex IV/2 
R were the same used by Ludwig et al. [5]. Two hundred 
nanograms of total RNA were reverse-transcribed in a 
total volume of 20 \i\ containing 5 mM MgCl 2 , 1 mM 
dNTPs (Roche Diagnostics), 2.5 \iM random hexamers 
(Applied Biosystems), 1U RNase inhibitor (Applied Bio- 
systems), 2.5 MuLV reverse transcriptase (Applied Biosys- 
tems) in a buffer of 50 mM KC1, 10 mM Tris HC1 pH 8.3. 
The reaction was carried out at 42°C for 30 min followed 
by 5 min at 99°C. An aliquot (1.5 \A) of RT reaction was 
used to amplify all alternative CLCN5 mRNA species in a 
final volume of 25 \A containing 1.5, 2 or 3 mM MgCl 2 , 
0.2 mM dNTPs (Roche Diagnostics), 0.4 \iM primers, 0.04 
U JumpStart Taq ( Sigma- Aldrich) in 50 mM KC1, 10 mM 
Tris HQ pH 8.3. The amplification profile for each primer 
set consisted of an initial denaturation at 95°C for 5 min, 
followed by different amplification cycles (45 s at 94°C, 
45 s at specific Ta°C, 1 min at 72°C), and an exten- 
sion at 72°C for 7 min. PCR conditions are given in 
(see Additional file 2). 

To quantify the relative expression of each 5'UTR 
end, a semiquantitative comparative RT/PCR approach 
was performed using GAPDH as a housekeeping gene 
[12]. RT/PCR products were analysed by 7% polyacry- 
lamide gel electrophoresis followed by silver staining 
as previously described [12]. One ul was also analysed 
and quantified using the Agilent Bioanalyser technology 
(Agilent Technologies). Negative control reactions were 
performed without reverse transcriptase during a cDNA 
synthesis step to rule out any genomic contamination. 

Real-Time PCR of CLCN5 mRNA in the human kidney 

Real-Time quantitative polymerase chain reaction was 
performed to analyze the expression levels of all the 
CLCN5 mRNA species in the human kidney. Changes in 
CLCN5 gene mRNA levels were determined by quanti- 
tative relative Real-Time PCR with the iCycler Termal 
Cycler (Bio-Rad). The reaction was carried out in a final 
volume of 25 \A containing 1 \A of RT reaction, lx iQ 



SYBR Green supermix (BioRad) and 0.3 \iM of specific 
primers for 5'UTR exons. To avoid genomic contami- 
nation, primers spanning exon boundaries were con- 
structed. It was not possible to design primer pairs 
specific for all isoforms, and for this reason isoforms 
containing exon Ha were quantified together as a group, 
as were isoforms containing exon lib (see Additional 
file 3). Samples were loaded in triplicate, 25 ul/well, in 
96-well plates (Biorad). Negative control reactions were 
performed to rule out any contamination. 

The thermal cycling profile was the same for each pri- 
mer set and consisted of an initial denaturation at 95°C 
for 5 min, followed by 45 amplification cycles of 10 s at 
95°C and 45 s at 66°C, followed by 1 cycle at 60°C for 
1 min. Melting curve analysis consisted of 80 cycles at 
60°C for 10 s and was used to confirm the specificity of 
the amplification products. GAPDH was the housekeep- 
ing gene used as an internal primer control. A positive 
cDNA control was also used as reference (the isoform 
A, the most abundantly expressed). 

The relative expression of each 5' UTR isoform was 
compared to GAPDH expression calculating the ACt 
value of expression as follows: Ct isoform - Ct GAPDH. 
The Ct gives a raw idea about the fold change in gene 
expression [13]. 

Bioinformatic analyses 

The NNSPLICE (http://www.fruitfly.org/seq_tools/splice. 
html) [14], NetGene2 (http://www.cbs.dtu.dk/services/ 
NetGene2/) [15], and GeneSplicer (http://www.cbcb. 
umd.edu/software/GeneSplicer/gene_spl.shtml) [16] pro- 
grams were used to identify potential splice sites. 

Amino acid sequences were subjected to computational 
analysis with the ORF Finder program (http://www.ncbi. 
nlm.nih.gov/projects/gorf/). 

The online tool Human Splicing Finder version 2.4.1 
(http://www.umd.be/HSF/HSF.html) was used to identify 
splicing motifs in our human sequence of interest [17]. 
The web server mRNAfold (http://rna.tbi.univie.ac.at/cgi- 
bin/RNAfold.cgi) was used to predict the pre-mRNA sec- 
ondary structure [18,19]. 

The analysis of the CLCNS promoter region was per- 
formed using the ENCODE (Encyclopedia of DNA Ele- 
ments) project [20] data in the UCSC Genome Browser 
for defining the transcriptional regulatory regions. 

Results 

Structure of CLCN5 5'UTR region 

RACE analysis of CLCNS 5' cDNA ends in the human 
kidney detected eleven transcripts initiating at three dif- 
ferent nucleotides: - 2407, - 1426 in respect to the ATG 
initiation codon in exon 2, and at - 660 in respect to the 
ATG initiation codon in exon III. Analysis not only con- 
firmed the presence of previously identified UTR isoforms 
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(mRNA variants 1-4), but also extended their ends and 
identified new transcription start sites. A transcript initiat- 
ing at nucleotide - 2407 corresponds to mRNA variant 3, 
but is extended a further 41 bp in respect to the +1 origin 
reported in the NCBI reference sequence NM_000084A 
This result is in agreement with in silico analysis con- 
ducted with NNSPLICE, NetGene2, or GENESPLICER 
[14-16]. All of the tested programs predict a consensus se- 
quence (5'-tcccag A GC-3') containing the conserved motif 
"AG" at 41 bp upstream of exon la, which constitutes a 
potential acceptor splice site of high strength. These same 
programs do not identify the previously described ac- 
ceptor site. 

We were also able to confirm the presence of the al- 
ternative mRNA variant 4, albeit longer than earlier de- 
scribed by Forino et al. [4]. It retains intron lb and 
consists of an extended exon lb (exon lbl) with a new 
putative transcription start site located at nucleotide - 
1426 in intron la, 1001 nt upstream of the beginning of 
exon lb, and is 1379 bp long. In silico analysis also de- 
tected a candidate acceptor splice site that contains the 
conserved motif "AG", (5'-ctacag A AT-3') corresponding 
to the beginning of the transcript identified by RACE 
analysis (Figure 1). 

We also extended the 5' end of mRNA variant 4, 
870 nt further in respect to the +1 origin reported in the 
NCBI reference sequence NM_00 1282 163.1, but it was 
not possible to obtain the full length cDNA. Therefore 
we hypothesized that the mRNA variant 4 and the alter- 
native mRNA variant 4 have the same transcription start 
site differing only regarding the absence/presence of in- 
tron lb (Figure 1). 

We confirmed the presence of the mRNA variants 1 
and 2 identified by Ludwig et al. [5] with both alterna- 
tively spliced exons Ha and lib which, according to the 
NCBI reference sequences NM_001 127899.3 and NM_ 
001127898.3, contain the transcriptional start site at 



nucleotide - 660 in respect to the ATG initiation codon 
in exon III. 

RACE analysis detected six new 5'UTR ends of hu- 
man CLCNS cDNA. One isoform consisted of a new 
287 bp exon newly named exon c, located within intron 
la according to the classical 5'UTR structure, represent- 
ing an alternative splicing of the mRNA alternative vari- 
ant 4 if one considers the new structure of exon lbl 
resulting from our RACE PCR analysis. Exon c contains 
a palindromic region of 28 nucleotides and, as a result of 
5' alternative splicing, originates two different mRNAs 
containing an exon c.l of 102 bp (5'-TT A gtaagt-3') and 
exon c (5'-AG A gttggt-3'), respectively (Figure 2). We 
called these variants 6 and 7. As observed from analysis 
of splice sites and from consensus values calculated with 
the Shapiro and Senapathy matrices [21], both donor 
splice sites have similar consensus values (c.l 79.4, and 
c 79.2), and have similar strength, and likely compete 
for splicing factors. 

Due to alternative splicing of exon II, four new long 
transcripts were detected, two of them containing the 
exon VI (variant 8 and 9), and the other two containing 
exon V and VI (variant 10 and 11). Both new exons V 
and VI (131 bp and 194 bp long, respectively) are lo- 
cated 7820 bp and 11977 bp downstream of the exon IV 
respectively (Figure 3). Bioinformatic tools identified the 
acceptor (5'-tgtcag A AG-3') and donor (5'-AG A gtaagc- 
3') splice sites upstream and downstream of the exon 
VI. These tools identified only the donor splice site (5'- 
AG A gtatgt-3') for exon V. 

With our experiments we were not able to identify the 
mRNA variant 5 (NM_001272102.1). From these results 
we can conclude that the human CLCNS gene comprises 
at least 20 exons, eight of them in the 5'UTR region, 
with transcription initiating from at least three different 
start sites. As a result of 5' alternative splicing in some 
exons, 11 different mRNAs are generated (Figure 4). The 
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Figure 1 Characterization of the CLCN5 5'UTR variant 4 and alternative variant 4 by RACE PCR and sequencing. + 1 indicates the 
putative transcriptional start site of alternative variant 4, located at intron 1a, 1001 nt upstream of the first nucleotide of exon 1b described by 
Fisher et al. (2). For variant 4 it was not been possible to obtain the full length cDNA but probably the two variants have the same transcription 
start site and differ only regarding the absence/presence of intron 1b. Coloured boxes represent exons and open boxes represent introns. 
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mRNA variant 6 



fcxcll Ex 2 ] 



Figure 2 Characterization of the CLCN5 5'UTR variants 6 and 7 by RACE PCR and sequencing. + 1, which is located at intron 1a, indicates 
the putative mRNA variant 7 transcriptional start site that is the same of alternative variant 4. Exon c of 287 bp after 5' alternative splicing 
originates two different mRNAs containing an exon c.1 of 102 bp (variant 6) and exon c of 287 bp (variant 7). Coloured boxes represent exons 
and open boxes represent introns. 



complete sequence of extended 5' UTR ends of the 
known CLCN5 transcripts (type 3, 4 and alternative 4) 
and of the newly identified CLCN5 5'UTR variants (types 
6-11) are given in Additional file 4. 

Expression of CLCN5 5'UTR ends in different human 
tissues 

Healthy human tissues (kidney, brain, lung, liver, colon, 
placenta, testes, skeletal muscle, endothelial cells and pe- 
ripheral leucocytes) from normal individuals were used 
to evaluate the presence and the expression of the diffe- 
rent CLCNS transcripts. Both the translated region com- 
mon to all isoforms (exons 3-6) and the 5'UTR ends 
were analyzed. Quantitative comparative RT/PCR analysis 



detected CLCNS mRNA in all investigated tissues (Table 1, 
and Additional file 5). 

Using primers specific for each 5'UTR isoform we 
demonstrated that some are not present in all tissues, 
and that the kidney is the unique tissue in which all iso- 
forms are expressed (Table 1). Analysis of the distribu- 
tion of CLCNS 5'UTR ends revealed that the already 
known mRNA variants 1-4 and alternative variant 4 are 
expressed in all tissues with variable levels: the mRNA 
variant 3 is the most abundant in all tissues except the 
lung, and the mRNA variant 4 is the less abundant in all 
tissues except the colon. mRNA variants 1 and 2 are also 
present in all tissues, and the transcript containing exon 
Ha is much more abundant than that with exon lib. 
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Figure 3 Characterization of the new CLCN5 5'UTR long transcripts by RACE PCR and sequencing. The four long transcripts contain the 
already described exons l-IV and the new exons V and VI (variants 8-1 1) (131 bp and 194 bp long, respectively), located 7820 bp and 1 1977 bp 
downstream of the exon IV respectively . + 1 indicates the putative transcriptional start site located at nucleotide - 660 in respect to the ATG 
initiation codon in exon III. Coloured boxes represent exons and open boxes represent introns. 
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Figure 4 Genomic organization of CLCN5 5'UTR region. The CLCN5 5' UTR region consists of 8 different 5' alternatively used exons, some of 
this, remains untranslated. As a result of 5' alternative splicing (represented by two vertical bars) in exons II and 1b, 1 1 different mRNAs are 
generated. Transcription initiates from three different start sites (represented by an arrow) and there are three translation start sites (represented 
by an inverted triangle). Coloured boxes represent exons and the connecting lines between boxes the introns. 



The newly identified variants 6 and 7 are the most het- 
erogeneous between different tissue samples: both are 
expressed in the kidney, only variant 6 was expressed in 
liver and lung, while neither variant 7 was present in the 
remaining tissues. The variant 7 appeared specific to the 



kidney and was absent from all other tissues. The four 
new long 5'UTR isoforms containing exon VI or exon V 
plus exon VI also have variable expression levels be- 
tween tissues. The transcript with exon VI (variants 8 
and 9) is present in almost all sites except liver, lung and 



Table 1 Results of RT/PCR analysis of 5' CLCN5 mRNA variants 



Tissue 



mRNA variants 
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3 


4 


Alternative 4 


6 
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10 
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CLCN5 
Common 
Region 


Liver 


+++ 


+/- 


+++ 




++ 


+ 












+++ 


Lung 


+++ 


+/- 


+ 




++ 


+ 












++ 


Placenta 


+++ 


+ 


++ 




+ 






+++ 








+ 


Testes 


+++ 


+ 


+++ 




+++ 






+++ 




+++ 




+++ 


Skeletal muscle 


+++ 


+ 


++ 




++ 






+++ 








+++ 


Brain 


++ 


+ 


+++ 




+++ 






+++ 








++ 


Colon 


+++ 


+ 


+++ 


+ 


+++ 






++ 


++ 


+ 


+ 


+++ 


Endothelial cells 


+++ 


+/- 


++ 




+ 














+ 


Kidney 


+++ 


+ 


+++ 




+++ 


++ 


+++ 


+++ 


++ 


++ 


+ 


+++ 


Leucocytes 


++ 


+ 


++ 




++ 






+++ 








++ 



The table reports the presence/absence of all identified variants and their relative expression in different human tissues and cells. 
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endothelial cells, whereas the transcript with exons V 
and VI (variants 10 and 11) is expressed only in kidney, 
colon and in testes. In both cases the mRNAs with exon 
Ha are much more abundant than that with exon lib 
(Table 1). 

In summary, the different 5'UTR ends have variable 
expression levels from tissue to tissue. The tissues that 
are most similar regarding the isoform expression, 
both in terms of abundance and expression pattern, 
are kidney, colon and testis. The mRNA variants 3,2, 
the alternative variant 4 and variant 7 appear to be 
the most abundant in these 3 sites. 

Real-Time PCR quantification of the CLCN5 mRNA species 
in the human kidney 

Finally, the levels of all 5'UTR isoforms we identified 
were quantified in human kidney tissue using Real Time 
PCR. The isoforms have very different expression levels: 
the mRNA variant 3, which was used as a calibrator to 
calculate the relative abundance of other isoforms, is the 
most abundant; all the mRNA variants containing exon 
Ha (variant 2, 8 and 10) and variant 7 have slightly lower 
levels (Figure 5). 

In general, the results of Real-Time experiments were 
in agreement with those observed by RACE analysis. 
In fact out of 172 cloned 5' RACE fragments, 77 (45%) 
contained type 3 mRNA species, 46 (27%) contained 
type 7, 34 (20%) were type 2, and 6 (3.5%) were alter- 
native type 4. 

Discussion 

Our results show that the human CLCNS gene 
comprises at least 20 exons, eight of them being in 
the 5'UTR region, with transcription initiating from at 
least three different start sites. As a result of 5 ' alternative 
splicing in some exons, 11 different mRNAs are generated. 
Our findings highlight the structural complexity of the 



CLCNS 5'UTR region in renal and extrarenal tissues, 
and suggest that this region is likely involved in 
C1C-5 expression and Dent disease pathogenesis. 
Although some authors [5,8] have tested the pro- 
moter region in their patients without detecting vari- 
ants, the deeper characterization obtained by our 
study should allow to explore regions of the gene 
never analyzed before for searching possible rare 
variants that may act as disease-causing mutations 
or modifier alleles. 

To further characterize the functional organization of 
the gene, the 5 '-flanking region of exons la, lbl and I 
were analyzed for possible promoter regions and 
transcription factor binding sites. ENCODE project data, 
aimed to delineate all of the functional elements 
encoded in the human genome sequence including the 
mapping of histone modifications, the transcription 
factor (TF) binding sites by chromatin immunoprecipi- 
tation (ChIP), and the transcriptional regulatory 
regions, were used. Thus the Transcription track, the 
Overlayed H3K4Mel and Overlayed H3K27Ac tracks, 
the DNase Clusters, and the Txn Factor ChIP tracks 
were considered. These tracks complement each other 
and together can shed much light on regulatory 
DNA [20]. 

The results suggest that 3 functional promoters are 
present in the CLCNS gene of different strength, origi- 
nating all isoforms with varying efficiency. Strong 
promoters are present upstream of exon la and exon I, 
and indeed variant 3 and variants 1 and 2 are the mRNA 
species most expressed in the kidney (Figure 5). Both 
promoters lack characteristic features of eukaryotic 
promoters, but instead contain consensus binding sites 
for transcription factors. GATA1 and GATA2 factor 
binding sites and consensus binding sites for diffe- 
rent transcription factors including E2F1 (transcrip- 
tion factor 1), ZNF263 (zinc finger protein 263), 




Figure 5 Real time PCR quantification of CLCN5 isoforms in the human kidney. Data were normalized to GAPDH housekeeping gene; the 
isoforms were quantified using the mRNA variant 3, the most abundantly expressed, as a reference value (10 A 0). The y- axis reports the expression 
values in logarithmic scale. 
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Nrfl (nuclear respiratory factor 1), HMGN3 (high 
mobility group nucleosomal binding domain 3), USF1 
(upstream transcription factor 1) and Inil (RING 
finger-like protein Inil) are present for mRNA variants 
1 and 2. For mRNA variant 3 the identified binding 
sites for the transcription factors are USF1, USF2 
(upstream transcription factor 2, c-fos interacting), 
KAP1 (kinesin-ii-associated protein), and CTCF 
(CCCTC-binding factor). All these transcription 
factors were identified in a kidney cell line (HEK293). 
A weaker promoter appears to control the expression 
of mRNA variant 4, alternative variant 4 and variants 6 
and 7. This promoter contains consensus binding sites 
for some transcription factors such as FOXA2 (forkhead 
box A2) and SETDB1 (SET domain, bifurcated 1). For 
all promoters the specific region containing the tran- 
scription factor binding sites overlap with a region that 
is DNasel sensitive. At the functional level, DNase 
hypersensitivity suggests that a region is very likely to 
be regulatory in nature, and promoters are particularly 
DNase sensitive. 

The data from ENCODE project did not identify 
sites for the transcription factor HNFla. Instead, in 
silico analysis conducted by Tanaka et al. [22] had 
revealed numerous HNFla binding sites in the 5' 
regulatory sequences of both mouse and human 
ClcnS/CLCNS gene. The transactivation of the ClcnS/ 
CLCN5 promoter by HNFla was verified in vitro, and 
the binding of HNFla to the ClcnS promoter in vivo 
was confirmed by chromatin immunoprecipitation in 
mouse kidney [22]. 

The mRNA variant 4, the alternative variant 4 and 
variants 6 and 7 share the same transcription start site 
but have different lengths that depend on which donor 
site is used. They are probably generated, with differen- 
tial efficiency, by multiple alternative splicing occurring 
at the 5 'of a single exon. This type of exon commonly 
originates from ancestral constitutive exons that, follo- 
wing mutation/s inside the exon or along the flanking 
intron, result in the creation of new alternative splice 
sites that compete with the ancient one for splice site 
selection [23]. 

In the case of variants 6 and 7, the two alternative 5' 
splicing sites in exon c have similar strength and so 
regulation is essential. It seems that the delicate balance 
between cis acting elements-enhancer (ESE) splicing 
regulatory elements (ESR) and silencers (ESS) located 
immediately upstream of each splice site is probably the 
major factor governing the level of each site usage in 
splicing [23,24]. In order to determine if this is the case, 
bioinformatic analysis, using the Human Splicing Finder 
version 2.4.1 program [17] was performed. The results 
of this analysis demonstrated that two ESE, two ESR and 
six ESS are present in the first 15 nt upstream of the 



donor splice site of exon c.l. Upstream of the donor 
splice site of exon c eight ESE, three ESR and five ESS 
are present. Therefore, although both sites have a similar 
strength, a higher density of ESE-ESR and lower density 
of ESS upstream of the exon c promote use of this splice 
site. Consistent with this observation, the expression 
levels of variant 7 are higher than those of variant 6 
(Figure 5). 

In the case of mRNA variant 4, the donor splice site of 
exon lb has a higher strength (value of 82.4) and there- 
fore is favored. This isoform also contains 10ESE, 2 ESR 
and 6 ESS. However this is not in agreement with our 
experimental results because this isoform is barely 
expressed. Alternative variant 4, whose level of expres- 
sion follows that of mRNA variant 3 of mRNA variants 
1, 2 and 8-11, is characterized by the presence of exon 
lb and the retention of intron lb (exon lbl) that, 
contrary to what usually happens, has not been removed 
during the processing of the primary transcript to 
mature messenger. Most likely other factors play an 
important role in the regulation of its transcription 
levels. Exon lbl could represent the ancestral consti- 
tutive exon from which all other exons (lb, c and c.l) 
originated (Figure 4). This exon is, in fact, usually present 
as the main product in respect to the others (Figure 5). 

The GC content around splice sites is closely associ- 
ated with the splice site usage [18,19,25]. We considered 
a region of 141 nucleotides surrounding the donors 
splice sites of exons c.l, c and lb (70 nucleotides 
upstream and downstream of the splice site). It was 
possible to see that the highest GC content (11 GC) is in 
the donor splice site of exon c, exon c.l (7 GC), and 
exon lb (5 GC). 

The web server mRNAfold was then used to predict 
the pre-mRNA secondary structure via calculation of 
minimum free energy [18,19]. It has been reported that 
local RNA secondary structures affect splice site selec- 
tion, the splicing sites closest to the start transcription 
site forming more stable structures than those located in 
more central RNA locations [18,19]. The minimum free 
energy calculated by the software was -44.70, -38.84 
and -30.70 kcal/mol for the donor splice sites down- 
stream of the exons c, cl and lb, respectively. Both the 
evaluation of GC content and the calculation of free 
energy once again are in agreement with the results we 
obtained from the expression study. In fact, the expression 
level of isoform containing exon c is higher than those 
containing exon c.l and lb (Figure 5). 

To conclude our characterization, we proceeded with 
the open reading frame analysis using the ORF Finder 
program. ORF analysis revealed that the mRNA variants 
3, 6, 7 and alternative variant 4, as well as variants 8-11 
encode for the canonical C1C-5 protein of 746 amino 
acids while variant 4, and variants 1-2 for a protein with 
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20 and 70 additional in frame amino acids, respectively. 
It is of note that the presence in the long transcripts of 
exon VI and/or V stabilizes the initiation of translation 
to the ATG in exon 2 and do not add, to the protein, 
additional amino acids. This is the most common situ- 
ation among most genes that have alternative promoters 
and, while not generating different protein isoforms, 
have mRNA variants which differ in the transcription 
pattern and in translation efficiency. 

The C1C-5 translated region was expressed in all human 
tissues examined. Our results are in agreement with what 
reported by Steinmeyer et al. [26] who demonstrated in 
the mouse that C1C-5 was predominantly expressed in the 
kidney but also observed in brain, liver, lung, and testis. 
Unlike Ludwig et al. [5], but in agreement with Ramos- 
Trujillo et al. [27] we demonstrated that C1C-5 is present 
in the human liver, brain and skeletal muscle. 

On the contrary not all the 5'UTR isoforms are 
expressed in the various tissues. mRNA variants type 3, 
2, 7 and alternative variant 4 appear to be the most 
abundant in the human kidney. 5'UTR exons that are 
commonly present among expressed isoforms are candi- 
dates for mutation analysis of Dent disease patients 
without genetic variation in the CLCN5 coding region. 
Polymorphisms or rare variants might also reside in 
these regions that acting as modifier alleles and might 
explain the phenotypic heterogeneity of Dent disease not 
only in Dent disease 1 but also in Dent disease 2. We have 
demonstrated, in fact, that variants in both OCRL and 
CLCN5 genes may act in concert in determining Dent 
disease phenotype variability [28]. 

Despite widespread expression of C1C-5, the Dent 
disease 1 phenotype is largely renal. Different 5'UTR 
ends present in various tissues may serve to differently 
regulate gene expression in response to physiological 
and pathological stimuli through mechanisms involving 
not only transcription but also translation efficiency. It is 
known, in fact, that the 5'UTR region has several roles 
in translational efficiency and translation inhibition 
probably through the interaction with the ribosome 
and specific DNA binding proteins or through some ele- 
ments contained in non coding regions. So, it is possible 
that CLCN5 mRNA levels do not correspond to C1C-5 
protein level and actual C1C-5 functions. 

Also of note is the presence of C1C-5 in the human 
brain and skeletal muscle. Although CNS and muscle 
impairment is common in Lowe syndrome, it has not 
been described in Dent disease 1 [11]. We recently evalu- 
ated a patient carrying a CLCN5 mutation whose clinical 
symptoms suggested a Dent 2 phenotype or a mild Lowe 
phenotype (unpublished). Our findings point to the possi- 
bility that certain Dent cases with CLCN5 disease-causing 
mutations might manifest extrarenal symptoms or a mild 
Lowe phenotype. 



The tissues that are most similar, both in terms of 
abundance and expression pattern of CLCN5 UTR 
isoforms are kidney, colon and testis. It is known that in 
rats and pigs C1C-5 is expressed in intestinal tissues that 
have endocytotic machinery [29,30]. As in renal 
proximal tubular and intercalated collecting duct cells, 
intestinal and colon epithelial cell C1C-5 is predomi- 
nantly if not exclusively intracellular, located in densely 
packed endocytotic vesicles in rats [29]. Some authors 
have evaluated the role of C1C-5 in intestinal calcium 
absorption by directly regulating the expression of 
calcium transport proteins, such as TRPV 6 [30-33]. 
Although in humans the intestinal calcium absorption 
takes place mainly in small intestine, our data, albeit 
indirectly, can support the hypothesis that in Dent 
disease hypercalciuria may be due to increased intestinal 
absorption of calcium rather than decreased tubular 
re-absorption. 

No phenotype associated with testicular dysfunction 
has been described so far in Dent disease patients. Fu- 
ture studies might be warranted to explore the possible 
role of C1C-5 in male infertility and to determine testicu- 
lar function in Dent disease 1 patients, analogous to the 
role of the CFTR gene in male infertility [34] . 

Conclusions 

Our results confirm the structural complexity of CLCN5 
5'UTR region. The presence of many different CLCN5 
5'UTR ends as well as the selective use of alternative 
promoters can affect if/when and how the transcript is 
translated, by binding to different transcription factors 
and regulating translation efficiency. The meaning of 
this complexity, characterized by the presence of several 
CLCN5 isoforms differentially-regulated in a tissue spe- 
cific manner, likely in relation to physiological and/or 
pathological conditions, remains to be clarified. This com- 
plexity might explain aspects of the Dent disease pheno- 
type and pathogenesis, and might obscure disease causing 
mutations or polymorphisms that influence disease 
expression. It will be interesting to analyze renal CLCN5 
isoforms and their expression levels in both normal and 
pathological conditions to identify and understand their 
physiologic and pathophysiologic roles. 
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